Aim
▪To provide you with information about graphics cards and their software that is needed to perform video tracking with the Deep learning technique.
▪This topic does not apply if you track your subject with Contour-based method.
note Technically, the GPU indicates the main processor in the graphics card. The graphics card includes other components such as the video memory (VRAM), the printed circuit board (PCB), a power management unit, the PCIe connectors, the display connectors etc. In this topic we assume that GPU coincides with the physical board that you install on your PC.
GPU microarchitecture
Graphics cards are categorized based on their microarchitecture. Such categories are also known as generations.
To know which microarchitecture category your GPU belongs to, browse to
https://en.wikipedia.org/wiki/CUDA
▪Under the table Compute Capability, GPU semiconductors and Nvidia GPU board products locate the name of the GPU in the column GeForce. The name under Micro-architecture tells which generation the GPU belongs to.
▪In the table Compute Capability (CUDA SDK support vs. Microarchitecture) you can see whether that GPU generation works with the CUDA SDK version used with EthoVision XT - currently 13.1.
GPU speed
When comparing two GPUs for deep learning applications, one of the best indicators is the memory bandwidth, rather than single properties like the GPU memory speed. A GPU's memory bandwidth determines how fast it can move data from/to the memory (VRAM) to the computation cores.
Theoretically, the memory bandwidth depends on:
▪The memory clock speed (in Hz).
▪The width of the data bus between the card memory and the graphics processor, in bits (also known as memory interface). This is the physical count of bits that may fit along the bus every clock cycle.
▪The memory type (the so-called memory clock type multiplier).
Memory bandwidth is measured in GB/s. For example, a graphics card with bandwidth 1000 GB/s is expected to perform two times faster than one with 500 memory bandwidth. Again, this is just a rough indication of the difference in performance, as the real performance may depend on several other factors, including the neural network architecture. If there is no reading from memory at any particular clock cycle, that cycle's worth of bandwidth goes unused and cannot be stored and used later when there's more memory pressure. Because GPS tend to acquire data in spurts, the usable bandwidth is generally lower than the numerically available bandwidth.
Another useful measure is the number of Floating Point Operations per Second (FLOPS). For example:
▪For the NVIDIA Quadro P2200: 3.8 TFLOPS (teraFLOPS equal to 1012 FLOPS).
▪For the NVIDIA T1000: 2.5 TFLOPS
▪For the NVIDIA GeForce RTX 5070: 27.5 TFLOPS.
For the specifications of graphics cards, see the following web site:
https://www.techpowerup.com/gpu-specs/
The following NVIDIA graphics cards have been successfully tested with Deep learning-based tracking.
tip Install the most recent driver version that is available on the NVIDIA web site for that GPU.
|
GPU |
Architecture |
Memory |
Bandwidth |
Driver v. |
|
T1000 |
Turing |
4 GB GDDR6 |
160 GB/s |
553.09 |
|
GeForce RTX 5070 |
Blackwell |
12 GB GDDR7 |
672 GB/s |
572.16 |
Desktop
note The GeForce RTX 5070 is a dual-slot GPU, that is, it occupies two PCIe slots. Moreover, there are several versions of this board, with two or three fans. Make sure it can fit in your PC. Both the T1000 and the GeForce RTX 5070 fit in the Dell Precision 3680 desktop computer.
|
GPU |
Architecture |
Memory |
Bandwidth |
Driver v. |
|
RTX 1000 Mobile |
Lovelace |
6GB GDDR6 |
192 GB/s |
553.09 |
Laptop
note If you install new GPU on a computer that you have used for some time to acquire the data with Deep learning, you must update the neural network model that is stored on your computer. See Update the neural network model after installing a new GPU
Performance
A few NVIDIA GPUs were tested in an EthoVision XT experiment with four arenas, one subject per arena, Deep learning as a body point detection technique. A 5-minute video of resolution 1280 x 1024 and frame rate 30 fps was used. The average time taken to acquire one trial was as follows:
▪Quadro P2200: 4 min 29 s.
▪T1000: 4 min 24 s.
▪GeForce RTX 5070: 1 min 58 s.
The driver of the graphics card must support the CUDA component. CUDA is a parallel computing platform and programming model created by manufacturer NVIDIA that helps speed up applications by using the power of graphics cards.
However, each graphics card seems to have its own latest driver version in Windows Update. For the cards specified above, Windows Update installs the needed minimal version. However, other cards may need more recent versions than that available with Windows Update. When a driver version is not sufficient for supporting CUDA, EthoVision XT gives the message in the Experiment Settings:
The graphics card driver is outdated or not installed, or the graphics card does not support CUDA.
We recommend to download and install the latest driver from the card’s manufacturer. This should make the card work properly with CUDA, provided that the hardware supports that version.
To know whether your NVIDIA graphics card is CUDA-enabled, refer to this web page:
https://developer.nvidia.com/cuda-gpus
CUDA version
If your GPU is already installed on the PC, follow these steps to know which version of CUDA is installed.
1.In Windows Explorer, locate the folder C:\ProgramData\Noldus\Components\Ethovision\TrackerInterfaceNN\[version number]
2.In that folder you find a file named CudaReport.txt.
3.Open this file. The runtime version should give an indication of the version of CUDA currently present on your PC.
If you do not see the folder specified above, create an experiment and in the Experiment Settings under Body Point Detection Technique select Deep Learning. Next, check again the folder in step 1.
note The version of CUDA is not the same thing as the version of the driver of the GPU!
Which GPU should I choose?
▪Choose in any case a recent NVIDIA GPU of the series 2000, 3000, 4000 or 5000. However, a very recent GPU or one of a new generation may not be compatible. Contact Noldus if in doubt.
▪In principle, a graphics card of the Turing microarchitecture category or newer, with 4 GB memory, 1024 CUDA cores and memory bandwidth up to 140 GB/s represents the minimum specifications. Pascal architecture (GTX 10 series) is no longer supported for deep learning features. Click one of the links below depending on the camera you use. In the table that appears, locate the row with a specific graphics card and DL (Deep learning) specified in the rightmost column:
▪The number of CUDA cores could be a good indicator of performance if you compare GPUs within the same microarchitecture (generation). However, when you compare cards between generations (e.g. Pascal vs. Turing) then the difference in the number of CUDA cores does not predict the actual difference in performance. An older card with more CUDA cores may not perform as good as a more recent card with fewer CUDA cores.
▪Overall, newer generations perform better; for example, an Ampere GPU should perform better than a Turing GPU.
▪GPUs of the Maxwell generation / 900 series and Pascal generation / GTX 10 series are not supported with EthoVision XT 19 for deep learning features.
▪Note that a more powerful GPU requires more power from the PC. Make sure that your PC has enough power to feed the GPU. See the prerequisites in Install a graphics card for Deep learning.
▪Performance of a graphics card not only depends on its own characteristics; it also depends heavily on the sample rate and the resolution chosen in EthoVision XT. See Video source
See also
▪Install a graphics card for Deep learning